Enhanced Centroid-Based Classification Technique by Filtering Outliers

نویسندگان

  • Kwangcheol Shin
  • Ajith Abraham
  • Sang-Yong Han
چکیده

Document clustering or unsupervised document classification has been used to enhance information retrieval. Recently this has become an intense area of research due to its practical importance. Outliers are the elements whose similarity to the centroid of the corresponding category is below some threshold value. In this paper, we show that excluding outliers from the noisy training data significantly improves the performance of the centroid-based classifier which is the best known method. The proposed method performs about 10% better than the centroid-based classifier.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robustified distance based fuzzy membership function for support vector machine classification

Fuzzification of support vector machine has been utilized to deal with outlier and noise problem. This importance is achieved, by the means of fuzzy membership function, which is generally built based on the distance of the points to the class centroid. The focus of this research is twofold. Firstly, by taking the advantage of robust statistics in the fuzzy SVM, more emphasis on reducing the im...

متن کامل

Improving kNN Text Categorization by Removing Outliers from Training Set

We show that excluding outliers from the training data significantly improves kNN classifier, which in this case performs about 10% better than the best know method—Centroid-based classifier. Outliers are the elements whose similarity to the centroid of the corresponding category is below a threshold.

متن کامل

Image Quality Assessment and Outliers Filtering in an Image-Based Animal Supervision System

This paper presents a probabilistic framework for the image quality assessment (QA), and filtering of outliers, in an image-based animal supervision system (asup). The proposed framework recognizes asup’s imperfect frames in two stages. The first stage deals with the similarity analysis of the same-class distributions. The objective of this stage is to maximize the separability measures by defi...

متن کامل

Discrete Point Cloud Filtering And Searching Based On VGSO Algorithm

The massive point cloud data obtained through the computer vision is uneven in density together with a lot of noise and outliers, which will greatly reduce the point cloud search efficiency and affect the surface reconstruction. Based on that, this paper presents a filtering algorithm based on Voxel Grid Statistical Outlier (VGSO): Firstly, 3D voxel grid is created for the massive point cloud d...

متن کامل

Fuzzy Centroid-Based Method Applied to Customer Requirements Ranking in Diba Fiberglass Company

The purpose of this study is to introduce an application of fuzzy centroid-based approach to ranking the customer requirements using QFD with competition considerations for Diba Fiberglass, an Iranian Company. The illustrated approach, not only focuses on the normal fuzzy numbers, but also considers the non-normal fuzzy numbers to capture the true customer requirements. To this end, first, we p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006